seed keyword
An Improved Method for Class-specific Keyword Extraction: A Case Study in the German Business Registry
Meisenbacher, Stephen, Schopf, Tim, Yan, Weixin, Holl, Patrick, Matthes, Florian
The task of $\textit{keyword extraction}$ is often an important initial step in unsupervised information extraction, forming the basis for tasks such as topic modeling or document classification. While recent methods have proven to be quite effective in the extraction of keywords, the identification of $\textit{class-specific}$ keywords, or only those pertaining to a predefined class, remains challenging. In this work, we propose an improved method for class-specific keyword extraction, which builds upon the popular $\textbf{KeyBERT}$ library to identify only keywords related to a class described by $\textit{seed keywords}$. We test this method using a dataset of German business registry entries, where the goal is to classify each business according to an economic sector. Our results reveal that our method greatly improves upon previous approaches, setting a new standard for $\textit{class-specific}$ keyword extraction.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Europe > Austria > Vienna (0.14)
- Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
- (8 more...)
LGDE: Local Graph-based Dictionary Expansion
Schindler, Dominik J., Jha, Sneha, Zhang, Xixuan, Buehling, Kilian, Heft, Annett, Barahona, Mauricio
Expanding a dictionary of pre-selected keywords is crucial for tasks in information retrieval, such as database query and online data collection. Here we propose Local Graph-based Dictionary Expansion (LGDE), a method that uses tools from manifold learning and network science for the data-driven discovery of keywords starting from a seed dictionary. At the heart of LGDE lies the creation of a word similarity graph derived from word embeddings and the application of local community detection based on graph diffusion to discover semantic neighbourhoods of pre-defined seed keywords. The diffusion in the local graph manifold allows the exploration of the complex nonlinear geometry of word embeddings and can capture word similarities based on paths of semantic association. We validate our method on a corpus of hate speech-related posts from Reddit and Gab and show that LGDE enriches the list of keywords and achieves significantly better performance than threshold methods based on direct word similarities. We further demonstrate the potential of our method through a real-world use case from communication science, where LGDE is evaluated quantitatively on data collected and analysed by domain experts by expanding a conspiracy-related dictionary.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
- Europe > Germany > Berlin (0.04)
- (12 more...)
- Law (1.00)
- Health & Medicine (1.00)
- Law Enforcement & Public Safety (0.93)
- Government > Regional Government > North America Government > United States Government (0.46)
- Information Technology > Communications > Social Media (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
One-Click Midjourney Prompts with ChatGPT
The gold-rush has started, and this time, it's not gold that's driving the frenzy -- it's AI. The world is currently witnessing an AI revolution that is changing the way we live, work, and interact. With advancements in machine learning, natural language processing, and computer vision, AI is becoming more intelligent and sophisticated, enabling it to take on complex tasks and decision-making processes. As AI technology continues to evolve, we also need to evolve in order to harness its full potential, avoid being obsolete and mitigate potential risks. ChatGPT is an AI language model developed by OpenAI that can generate human-like responses to text-based prompts.
- Education (0.52)
- Health & Medicine > Surgery (0.33)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.52)
Multi-Modal Prediction
Multi-Modal learning is about gathering and integrating information of different modalities of data like tabular, text, image & video. There are many increasing industrial applications specially in e-commerce and energy sector where the only solution towards solving the business problem is multi-modal prediction, and hence the need to implement the same in the best way possible is ever emerging. The current scope will be tabular, text & image-based execution with comprehensive methodology explanation. There are two primary challenges, first how to interpret text and image such that our model can learn from the changes and predict successfully. Second how to consider and connect information from different modalities.
Merchandise Recommendation for Retail Events with Word Embedding Weighted Tf-idf and Dynamic Query Expansion
We rank all we rely on item retrieval from marketplace inventory. With retrieved items by the sum of tf-idf scores from matched words, feedback to expand query scope, we discuss keyword expansion and keep the items with total tf-idf scores above a threshold. The candidate selection using word embedding similarity, and an retrieval based system works well to discover relevant enhanced tf-idf formula for expanded words in search ranking.
- North America > United States > Michigan > Washtenaw County > Ann Arbor (0.15)
- North America > United States > California > Santa Clara County > San Jose (0.05)
Council Post: A Beginner's Guide To SEO Keyword Research In 2021
Amine is the CEO of IronMonk, a digital marketing agency specializing in SEO & CMO at Regal Assets, an IRA company. There used to be a time when you could install a free Chrome browser plug-in, scrape all the competitive keywords you need, throw them into an article a couple of dozen times and then immediately rank for high-volume search terms after hitting "publish" on your WordPress site. Those days are no longer, and that's not such a bad thing. Google has gone to great lengths to improve the internet user experience over the past couple of decades. If you want to create rankable content these days, you need to provide exceptional value for your reader.
21 Actionable SEO Techniques That Work GREAT in 2018
Because today I'm going to show you the exact SEO techniques that I use to generate 151,981 unique visitors per month. All of these proven strategies are working GREAT in 2018. And here are the tactics you'll learn about in this post. A while back, Google announced their RankBrain algorithm. And as it turns out, this update was a HUGE game changer. Google RankBrain is Google's first machine learning algorithm. As you can see, the happier you make Google's users, the higher you'll rank. Sure, backlinks, keywords and other traditional signals are still important. But RankBrain is quickly taking over. In fact, Google went on to say that RankBrain was one of their "top 3" ranking signals: How do you optimize your site for Google RankBrain? First, improve your organic click-through-rate (CTR). Google RankBrain wants to see that lots of people are clicking on your site in the search results. Let's boost it to the top of the page so it's easier to find".
Data Scientist Shares his Growth Hacking Secrets
In this article, we discuss various strategies used to generate exponential traffic growth, while preserving traffic quality, and user loyalty. Raw data science: getting the right data sets, leveraging them, Playing with various tools and API's: designing an automated machine-to-machine communication service between Hootsuite and Twitter / LinkedIn based on insights automatically distilled from the following data sources: (1) data obtained via the Google Analytics API (traffic statistics about 50,000 live DSC articles), and (2) data collected via a web crawler written in Python A blend of high-level (strategic) data science and low-level (tactical or operational) data science. In the end, relatively little coding is involved in the process. Domain expertise and smart innovation play a critical role. Optimizing parameters of the statistical process used to select articles, create tweets, and schedule them, using experimental design and A/B testing Artificial intelligence: detection and removal of articles that are time-sensitive, automated creation of relevant hash-tags for selected tweets, and creation of a taxonomy of all our articles using simple indexing classification scheme Smart analytic-driven advertising on Twitter, using a good list of data science thought leaders worth following, as our core data set for advertising purposes. The creation of this list is an interesting data science project in itself.
- Information Technology > Information Management (1.00)
- Information Technology > Data Science (1.00)
- Information Technology > Communications > Social Media (1.00)
- Information Technology > Artificial Intelligence (1.00)
4 Easy Steps to Structure Highly Unstructured Big Data, via Automated Indexation
You have gathered gigabytes or terabytes of unstructured text, for instance scraping the Internet, or pieces of email from your employees or users, or tweets, or millions of products that you want to categorize (only product description and product name is available - sometimes with typos). Now you want to make sense of it, and extract value, possibly design a nice search engine so that your customers can easily find your products. The core algorithm that you need is an automated cataloguer, also called indexer. I am going to explain in layman's terms how it works. First, let's assume that the data consists of Typically, these "pages" are stored as large repositories containing millions or billions of (sometimes compressed) text files spread across a number of folders and sub-folders, or multiple servers.
- Information Technology > Data Science > Data Mining > Big Data (0.68)
- Information Technology > Artificial Intelligence > Natural Language (0.56)